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Abstract 

When modeling co-expression networks from 
high-throughput time course data, the Pear- 
son correlation function is reasonably effective. 
However, this approach is limited since it cannot 
capture non-linear interactions and time shifts 
between the signals. Here we propose to over- 
come these two issues by employing a novel sim- 
ilarity function, DTWMIC, combining a measure 
taking care of functional interactions of signals 
(MIC) and a measure identifying horizontal dis- 
placements (DTW). By using a network compar- 
ison metric to quantify differences, we show the 
effectiveness of the DTWMIC approach on both 
synthetic and transcriptomic datasets. 

1 Introduction 

Inferring a biological graph (e.g., a Gene Regula- 
tory Network) from high-throughput logitudinal mea- 
surements of its nodes is nowadays one of the ma- 
jor challenges in computational biology, and several are 
the proposed solutions to this still unanswered question 
l |De Smet and Marchal, 20101 |Marbach et al, 2010J . Al- 
though the presence of many indirect interactions set 
the problem in a strongly non-linear domain, simple ba- 
sic approaches such as the coexpression networks via 
correlation measures proved to work reasonably well 
| |Allen et al, 201 2], However, Pearsons correlation lacks 
sensitivity in case of non-linear relations between signals 
and when signals are mutually delayed, thus the reliability 
of a coexepression network would benefit from being built 
by a measure taking care of these issues. A first example in 
this direction can be found in lEl Bakry et al, 20l0) , where 
the authors envisioned an ad-hoc technique to cope with 
the time lag. 

Here we follow a different approach introducing 
a novel similarity measure DTWMIC, whose com- 
ponents are the Dynamic Time Warping (DTW, 
[ S ak oe and Chiba, 1978|) and the M aximal Informa- 
tion Coefficient (MIC, | |Reshef et al, 20TT] ). While DTW 
can take care of the time-shifts, the MIC metric can detect 
functional non-linear interactions between the signals. 
Moreover, the presence of MIC alleviates the limitations 
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affecting DTW when signals, although related, present 
different levels of expression. To support the claim of 
better reconstruction performances (in terms of smaller 
HIM distances from the gold standard), we use three 
synthetic datasets generated by GeneNetWeaver (GNW, 
| |Scha ffter et a l, 201 1) ) following the guidelines of the 
DREAM Challenge l |Prill et al, 2"0T0) , and a transcrip- 
tomic dataset on the expression response of human T cells 
to PMA and ionomicin treatment [Rangel et al, 20041. 

2 Methods 

2.1 Coexpression Networks 

Correlation methods like WGCNA 

[ Zhang and Horvath, 2005) represent the most direct 
approach to the exploration of the gene co-expression 
network. The adjacency matrix is defined by computing 
a correlation function (absolute Pearson for WGCNA) 
between all pairs of gene signals G,;, Gj, soft thresholded 
(for instance by a power function) to determine the 
biological meaningfulness of the connections: 

<Hi =f(Gi, Gjf. 
These co-expression-based methods have been used in sev- 
eral studies and have shown their usefulness in interpreting 
biological results and identifying important gene modules. 
Therefore, we will use WGCNA as a reference method for 
the correlation-based approach. 

To take care of non-linear interactions and time shifts 
in the gene signals, we propose to employ the DTWMIC 
function: 

a, j = DTWMIC (G 4 , Gj) 

= (-^DTW.(G if Gj) 2 + MIC(G 1 ,G 2 )A " . 

In all experiments we set f3 = 6 (as in the WGNCA ap- 
proach) for both Pearson and the DTW-MIC measures. 
Comparison with gold standard network is quantitatively 
assessed in terms of HIM network distance. Details on 
DTW, MIC and HIM are provided hereafter. 

2.2 Maximal Information Coefficient 

The Maximal Information Coefficient (MIC) measure is 
a component of the Maximal Information-based Nonpara- 
metric Exploration (MINE) family of statistics, introduced 



in | |Reshef et al, 2011] Speed, 201 1) , for the exploration 
of two-variable relationships in multidimensional data sets, 
having the generality and equitability property. The MIC 
value is obtained by builiding several grids at different res- 
olutions on the scatterplot of the two variables and com- 
puting the largest possible mutual information achievable 
by any grid applied to the data, and then normalizing to the 
[0,1] range. 

The two distinctive features of MIC are generality, i.e., 
the ability of capturing variable relationships of differ- 
ent nature and equitability, that is the property of pe- 
nalizing similar levels of noise in the same way, regard- 
less of the nature of the relation between the variables. 
MIC can be computed in R by using the minerva package 
| |Albanese ef al, 2012| . 

2.3 Dynamic Time Warping 

The Dynamic Time Warping (DTW) 



l |Keogh and Pazzani, 1998} |Keogh and Pazzani, 2 0001 



is a measure of distance between two sequences consid- 
ering occurring time shifts between the series. Thus it 
proves more suitable than the Euclidean metric in curve 
comparison because it takes into account the shapes of the 
curves instead of just evaluating the pointwise distance of 
the vectors. 

The DTW algorithm also finds an optimal match be- 
tween the two given series by non-linearly warping them 
in the time dimension to determine a measure of their 
dissimilarity, stretching (or compressing) the time axis. 
As a comprehensive reference, the reader is referred to 
I IGusfield, 19971 . 

DTW elective application is the comparison of dif- 
ferent speech patterns in automatic speech recogni- 
tion (Sakoe and Chiba, 1978| , but it has been also 
used in functional genomics [ Aach and Church, 2 001 ; 
Furlanell o et al, 2006| . To obtain a similarity measure, we 
use the function DTW S = 1/(1 + DTW d ), where DTW d is 
the normalized distance between two series, as computed 
in the R package dtw [Giorgi no, 2009) . 

2.4 Hamming Ipsen Mikhailov Distance 

The HIM distance for network comparison is de- 
fined as the product metric of the Hamming distance 
H | |Tun et al, 2006] |Dougherty, 20K)) and the I psen- 
Mikhailov distance IM~1 Ipsen and Mikhailov, 2002], nor- 
malized by the factor V2 to set its upper bound to 1 : 



HIM(iVi,iV2) 



-^v/H^iVs) 2 



•IM(iVi,iV 2 ) 2 



for Ni : N% two undirected (possibly weighted) networks. 
The drawback of edit distances (such as H) is locality, 
as it focuses only on the network parts that are differ- 
ent in terms of presence or absence of matching links 
yjurman et al, 20 fl] . 

Spectral distances like IM are global, since they take into 
account the whole graph structure, but they cannot distin- 
guish isomorphic or isospectral graphs, which can corre- 
spond to quite different conditions within the biological 
context. 

Thus the HIM distance is a possible solution for both 
issues: details about HIM and its two components H and 
IM are given in Uu rman et al, 2012) . 



3 Results 

3.1 GeneNetWeaver data 

GNW allows the construction of DREAM-like net- 
works and corresponding simulated expressions (steady 
states and time course) generated as subnetworks of 
the E. coli [Gama-Castro et al, 2008| or the Yeast 
iBalaji et al, 2006] transcriptional regulatory network, 



with the possibility of choosing a number of param- 
eters both for the subnetwork and the data structure. 
In particular, we generated three synthetic networks 
Yeast2o,Ecoli2o, Ecoliso, where the subscript indicates the 
number of nodes of the the net, extracted randomly from 
the whole set of nodes. In each case, half of the selected 
nodes are regulators. 

For each datasets, we generated 10 replicates of logitudi- 
nal datasets, ccreated by a dynamic model mixing ordinary 
and stochastic differential equations, with 41 time points 
equally spaced between time and time 1000 and affected 
by 0.5% (Yeast) or 1% noise (E. coli). In each dataset, 
the first half of the time series shows the response of the 
network to a perturbation (at t=0 is the wild-type steady- 
state), then the perturbation is removed and the second half 
of the time series shows how the gene expression levels go 
back from the perturbed to the wild-type state, (this is the 
DREAM4 setup for the expression time course in GNW). 

For each experiment, we built the corresponding Pearson 
and DTWMIC coexpression networks, and we computed 
the HIM distance of both the inferred networks from the 
GNW gold standard. The results are reported in Table Q] 
and they show that the DTWMIC networks are consistently 
closer to the gold standard than the corresponding Pearson 
WGCNA graph. 

For the Yeast2o dataset, we further generated 4 time 
course datasets of the same length as above, but with a dual 
gene knockout. Also in this configuration, the DTWMIC 
inferred networks were closer to the gold standard: in the 
4 experiments, their HIM distances were 0.23, 0.21, 0.24 
and 0.21, while for the Pearson correlation networks the 
corresponding values were 0.30, 0.27, 0.31 and 0.47 re- 
spectively. 
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Table 1: HIM distances of the DTWMIC (D) and the 
WGCNA Pearson (P) networks for all experiments on the 
GNW datasets Yeast2o, Ecoli 2 o, Ecoliso 



(a) (b) (c) 

Figure 1: The T-cell network: gold standard (a), Pearson correlation network (b) and DTWMIC coexpression network (c). 
The gold standard has unweighted links, and the node color is red for degree > 10, orange for 5 < degree < 10 and yellow 
for node degree < 5. For the Person correlation network we show only the links whose weight is larger than 0.05; nodes 
are red when degree > 20, orange for 10 < degree < 20 and yellow for degree < 10. For the DTWMIC network we show 
only the links whose weight is larger than 0. 15; node colors are the same as in the Pearson net. 



3.2 T-cell data 



In the paper [Rangel et al, 2004.1, the authors investigated 
the response of a human T-cell line (Jirkat) to a treatment 
with PMA and ioconomin by measuring the expression of 
58 genes across 10 time points (0, 2, 4, 6, 8, 18, 24, 32, 
48, and 72 hours after treatment) with 34 replicates (data 
available in the R package longitudinal). Opgen-Rhein 



and Strimmer in | Opgen-Rhein and Strimmer, 2006b 



Opg en-Rhein and Strimmer, 2006a) constructed the corre 
sponding network by shrinkage estimation of the (partial) 
dynamical correlation, which we consider here as the 
ground truth network, displayed in Figure[Tfa). 

We then built, starting from the same dataset, the corre- 
lation networks inferred by Pearson and DTWMIC, plot- 
ted respectively in Figure[T| panels (b) and (c) respectively. 
Again, the DTWMIC network results closer to the gold 
standard than the Pearson net, with a HIM value of 0.164 
versus 0.214. 

Moreover, it is worthwhile noting what happens consid- 
ering separately the two components of the HIM distance in 
this case: while the Ipsen Mikhailov distance is still smaller 
for the DTWMIC network (IM=0.203 vs. IM=0.296), the 
Hamming distance is larger (H=0.1 12 vs. H=0.06). This 
yields that there is a smaller number of links changing be- 
tween the Pearson coexpression network and the gold stan- 
dard, but these changing links induce a strongly different 
structure between the two graphs. 

4 Conclusion 

We introduced here DTWMIC, a novel measure for infer- 
ring coexpression networks from longitudinal data as an 
alternative to the absolute Pearson correlation used in the 
WGCNA approach. Due to the nature of its components 



Dynamic Time Warping and Maximal Information Coef- 
ficient, the DTWMIC similarity can overcome the well 
known limitations of the Pearson correlation when deal- 
ing with horizontally displaced signals and indirect interac- 
tions. Experiments on biologically inspired synthetic data 
and gene expression time course show the higher precision 
in the network inference achieved by DTWMIC with re- 
spect to the Pearson correlation in different conditions. 
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